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Abstract 

In this paper we present a methodology of classifying hepatic (liver) lesions us- 
ing multidimensional persistent homology, the matching metric (also called the 
bottleneck distance), and a support vector machine. We present our classifica- 
tion results on a dataset of 132 lesions that have been outlined and annotated 
by radiologists. We find that topological features are useful in the classifica- 
tion of hepatic lesions. We also find that two-dimensional persistent homology 
outperforms one-dimensional persistent homology in this application. 

Keywords: medical image processing; image classification; persistent 
homology; computational topology 



1. Introduction 

Medical imaging technology allows doctors access to portions of the hu- 
man body which are visually inaccessible to the human eye. Often inspecting 
these medical images is a labor intensive process performed by diagnostic ra- 
diologists. The accuracy of the radiologist is obtained through training and 
experience |13j but even with extensive training and experience there are vari- 
ations in interpretations and accuracy among radiologists [15] . Despite an 
increasing emphasis on evidence-based medicine and improved imaging tech- 
niques, quantitative 'gold-standards' and clear guidelines for a radiologist's role 
in quantitative measurements remain elusive |12j . Image processing provides 
a way of both automating portions of the examination as well as providing 
standard tools for radiologists to use when reading an image. The qualitative 
nature of many radiological observations suggests that topological features may 
be useful in the classification and interpretation of medical images. 

In this paper, we explore automatic classification methods of computed to- 
mography (CT) scans of hepatic (liver) lesions. We have a dataset of CT scans 
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Figure 1: Abdominal CT Scan 



of 132 hepatic lesions along with an outline, diagnosis, and semantic descriptors 
of the lesion provided by a radiologist. There are nine lesion types represented 
in the data, with the vast majority of the lesions (90 lesions) evenly split be- 
tween cysts and metastases, followed by hemangiomas (18 lesions), hepatocellu- 
lar carcinomas (HCC, 11 lesions), focal nodules (5 lesions), abscesses (3 lesions), 
neuroendocrine neoplasms (NeN, 3 lesions), a single laceration and a single fat 
deposit. 

It has been demonstrated that semantic features are useful for classification 
in hepatic lesions [13]. This indicates that visually identifiable structures exist 
within the lesions, but it has been difficult finding quantitative methods of 
defining these structures. For example, consider the six images in Figures [3] 
and [4] The first three show the abscesses contained in our dataset. The second 
three show hemangiomas (deformations of blood vessels) . The abscesses present 
what is called 'cluster of grapes' morphology. But the arrangements of this 
structure (the clusters of grapes) are very different in each lesion. Similarly, the 
hemangiomas show the characteristic large dark central region with dense white 
regions on the outer edge of the lesion. Yet, the hemangiomas lack a rotational 
orientation, different numbers of the two region types exist and the formations 
vary in size and shape. The qualitative nature of these observations has made 
it difficult to find quantitative measures of the structures. 

1.1. Prior Work 

As mentioned above, semantic features have been one successful method for 
classifying the liver lesions [13] • This has led to preliminary investigations into 
using computational features to predict semantic features, which can be used to 
classify the lesions pj]. Additionally, computational features have shown some 
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(a) Cyst 




(d) HCC 





(b) Metastasis 




(e) Focal Nodule 





(c) Hemangioma 




(f) Abscess 




(h) Laceration 

(g) NeN (i) Fat Deposit 

Figure 2: Lesion Diagnoses 






Figure 3: Abscesses 






Figure 4: Hemangiomas 
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success in directly classifying liver lesions [TTJ [T31 E] . Most of these studies 
use a large number of various types of features (intensity histograms, wavelets, 
boundary features, etc) to classify the lesions. Shape descriptors for liver lesions 
have also been investigated and found to work well in retrieving similar lesions 

Persistent homology [21[IHj is an approach to extending the notion of shape to 
point clouds or finite metric spaces, which has been developed over the last 10-15 
years. It has two directions of application, one which gives understanding of the 
overall organization of data sets (see e.g. [H HI [14] ) , and a second which applies 
to data sets consisting of data points which themselves have complex structure, 
such as databases of images or of chemical structures. This paper approaches 
radiological images from the second point of view. Persistent homology depends 
on a family of simplicial complexes parametrized by a real variable. In many 
applications, including the ones in the first direction, this parameter is simply a 
scale variable, which measures distances between points. In applications in the 
second direction, however, one uses families depending on other parameters, 
and often uses multiple persistence invariants in conjunction with each other 
to obtain useful information (see El HH])- I n this paper we will find an 
appropriate set of parameters and functions of those parameters for classification 
problems arising in the radiology of liver lesions. 

1.2. Our Work 

In this paper we present a framework for computing persistent homology on 
images. This framework is flexible and allows the user to tailor the nitrations to 
the application. We demonstrate this by applying this framework to classifying 
our set of hepatic lesions and using the bottleneck distance to compare lesions. 
We find that our method gets comparable results to existing methods and our 
results demonstrate the possibilities of tailoring multidimensional persistence 
to a specific application. Additionally, since we use an 'off-the-shelf imple- 
mentation of the support vector machine to perform the final classification, our 
results demonstrate that this framework can be easily integrated with existing 
classification techniques. 

2. Theory 

The work presented in this paper is an application of computational topology 
to medical image processing. As such, the theoretical material will be presented 
in a compact, informal manner following [SUH]- Those interested in the details 
behind the theory are encouraged to read the associated references. 

2.1. Filtrations & Simplicial Complexes 

Let V be a set of points. A k-simplex S C V of size k + 1. A geometric 
realization of a simplex consists of the convex hull of k + 1 affincly independent 
points in M. d ,d > k. A 0-simplex is a point or vertex, a 1-simplex is an edge, 
a 2-simplex is a triangle, and a 3-simplex is a tetrahedron. Higher dimensional 
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simplices are difficult to visualize and thus less familiar to our experience. A 
simplicial complex on V is a set A of simplices on V such that the following 
holds: If the simplex a € K, and the simplex t a a then r € A. A subcomplex 
of A" is a simplicial complex L C K. A filtration of a complex A is a nested 
sequence of complexes = K min C A min+1 C • • • C K max — A. We call A a 
filtered complex. 

2.2. Persistent Homology 

Homology is an algebraic invariant that counts some of the topological in- 
variants of a complex in terms of its Betti numbers, fJi. Specifically, /3q counts 
the number of connected components of A, fj\ counts the number of tunnels 
through X (two-dimensional empty space enclosed by a one-dimensional curve), 
and /?2 counts the number of voids in X (three-dimensional empty space enclosed 
by a two-dimensional surface). 

Consider the filtered complex, a nested sequence of complexes, mentioned 
above. We can track the topological changes that occur in this sequence via 
persistent homology, an algebraic invariant which tracks the birth (appearance) 
and death (disappearance) of topological attributes (Betti numbers) during the 
evolution of this sequence. The birth (a) and death (b) of a feature can be 
represented using the interval [a, b] C R. Counting the number of features in 
existence at a point i (intervals which contain i) in the persistent homology of 
the filtered complex gives the homology of A^. Generally, the longer a feature's 
lifetime, the more important the feature is considered to be. Features can 
have infinite lifetimes if they appear in all complexes after a given point in the 
sequence. 

These concepts can be extended to multidimensional filtrations j5] . For ex- 
ample, in a two-dimensional filtration we have a sequence of simplicial complexes 
such that A ilji2 C Kj 1 j 2 if ij < j\ or 12 < J2- Now each feature is represented 
as a two-dimensional 'sheet' belonging to M 2 . Except where noted, it can be 
assumed we are referring to one-dimensional filtrations. 

2.3. A metric on barcodes 

By computing the persistent homology of a filtered complex, we obtain a 
descriptor of the complex in the form of a finite multi-set of intervals, called a 
barcode. Thus, the barcode is useful both as a data structure for storing the 
results of the computation of the persistent homology of a filtered complex and 
as a visual representation of the persistent homology. A quasi- metric D, which 
we define as a metric which can take infinite values, can be defined over the 
collection of all barcodes allowing us to compare complexes using this quasi- 
metric on the space of barcodes. We follow [9 in defining D. 

Let Bi and B 2 be two barcodes or finite multi-sets of intervals. For two 
intervals, I and J, we define their dissimilarity 5(1, J) to be their symmetric 
difference: 6(1, J) = (i(I U J — I n J), where fj, denotes the one-dimensional 
measure. Note that /, J can be infinite intervals and consequently 5(1, J) can 
be infinite. A matching M on B\ and B2 is a set M(B\,B2) C B% x £> 2 = 
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{(I, J)\I € B\, J 6 -B2}, where each interval occurs in at most one pair (7, J). 
Note that in general, M will not have matched all intervals from B\ and B 2 . 
Let N be the set of unmatched intervals. We can now defined Dm{B\ 1 B 2 ), or 
the distance relative to M, as 

D M (B 1 ,B 2 )= *M 

We define the quasi- metric D(Bi,B2) as the best possible matching between 
Bi, B 2 : 

D(B 1 ,B 2 ) = mm D M (B U B 2 ) 

This problem can be recast maximum weight bipartite matching problem and 
solved using the Hungarian algorithm. See [S] for details. We will abuse nomen- 
clature slightly and refer to D as the matching metric. 

3. Calculations 

Each point in our dataset is an image, a two-dimensional collection of pixels 
(grayscale or intensity values) laid out on a grid with a set of contiguous pixels 
marked as lesion tissue. To use computational topology to analyze each image, 
we need a method of forming a filtered complex from an image. We can then 
use the theory outlined in Section [2] to create a barcode for each image and then 
use the matching metric to compare various images. 

3.1. Forming a Simplicial Complex from an Image 

Given a two-dimensional image /, we begin with the empty complex K. We 
then assign a vertex to each pixel in / and add each of these vertices (O-simplices) 
to K. We then form 1-simplices from these vertices if the associated pixels are 
adjacent in I (we treat diagonal pixels as adjacent). We then add 2-simpliccs 
on the vertices where 3 pixels are mutually adjacent. This forms a very regular 
simplicial complex, a mesh, with the only variations between images being the 
boundary shape of /. As we are interested only in the hepatic lesions identified 
in the image by the radiologist, we define / to be the region of pixels contained 
within the lesion outline, plus a border of healthy tissue around the edge of 
the lesion. We set the width of this border to 5 pixels. We keep this border 
because it is useful to have some healthy tissue included in the filtered complex 
for comparison with the lesion tissue. 

3.2. Image Filtrations 

After constructing the simplicial complex K on the image /, we now want to 
define a filtration on K. A natural approach is to assign a value to each vertex. 
We can represent this as a function / : V — > R, where V is the vertex set of 
K. Let fmin and f ma x denote the minimum and maximum values obtained by 
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/ on V. We construct Ki by including any simplex S € K with the property 
VveSJ(v)<i. 

Ki = {SeK: Vv e 5, /(«) < z} (1) 

Intuitively, f(v) represents the point at which v enters the filtration and max„ e s f(v) 
determines the point at which a simplex S € K enters the filtration. Now we 
have the filtered complex Kf min C i£/ min +i C • • • C Kf max = K. Notice that 
if we reverse the inequality in Equation [T] we get an equally valid filtration, 
-^/ m ax !== ^/ ma ,-i Q • • • Q Kf m in — We will refer to these as the increasing 
and decreasing nitrations. 

As each vertex is associated with a pixel in the original image, it is natural 
use the pixel intensity (i.e., the grayscale value of each pixel) to assign a value 
to each vertex. This forms the basis of what we will call the intensity filtration. 
A toy example of the increasing intensity filtration is shown in Figure [5a] The 
colors represent the point in the filtration when the vertices and edges are added 
(we do not shade triangles for aesthetic reasons). 

We define an additional filtration by associating the distance from the lesion 
border, as given by the radiologist, to each pixel. We call this the border filtra- 
tion. The increasing border filtration produces an 'annulus' which grows until 
it fills the lesion. The decreasing filtration produces a misshapen 'disc' which 
expands from the center of the lesion. While this is clearly not topologically 
useful for classification, in practice the combination of the border filtration with 
the intensity filtration gives better classification results than using the intensity 
filtration alone. 




(b) /3o barcode for above image 



(c) pi barcode for above image 
Figure 5: Constructing an increasing ID-filtration on an image 

To simplify the computational difficulties encountered with two-dimensional 
nitrations, we use one-dimensional filtration slices to approximate the two- 
dimensional nitrations. Let K\ represent the border filtration and represent 
the pixel intensity filtration. We divide the range of the border-filtration into 20 
equally spaced slices. At each slice i, we use the intensity filtration to compute 
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the persistent homology of the subcomplex K\. This gives rise to the filtered 
complex K ifmin = K\ n K)^ C • • • C K ifmam = K\ n - A? . 

We can treat each of these one-dimensional barcodes as a different measure- 
ment on the lesion. The options of the increasing or decreasing filtration on the 
border and intensity functions, as well as the /3q and /3i barcodes gives eight 
barcodes at each slice, yielding a total of 160 barcodes computed on each lesion. 

To account for differences in the pixel scaling, whether due to image format- 
ting or differing CT scanners, we normalize the pixel range from zero to 1. We 
stop infinite barcodes at 1.1 so that a differing number of infinite bars does not 
immediately separate two lesions (some hemangiomas, for example, have two 
dense regions while others have 3 or 4). 

3.3. Feature Generation and Machine Learning 

To make use of existing machine learning techniques, it is necessary to pro- 
vide a vector of measurements for each lesion. Using the barcode distance, we 
can create a vector of relative measurements by computing the matching dis- 
tance between each lesion and all other lesions (including itself). In other words, 
we use the entire set of 132 images as the comparison set to generate our feature 
vector. We do this even when restricting ourselves to a smaller subset of lesions. 
This allows us to obtain information even from the lesion type sets that are too 
small for classification. 

Since we have 160 barcodes for each lesion, we choose to sum the 160 dis- 
tances to create a vector of size 132 for each lesion. This vector can then be 
used in traditional machine learning algorithms. We choose to use an imple- 
mentation of the support vector machine (SVM) called LibSVM to test the 
classification accuracy on various subsets of the data with the one-dimensional 
and two-dimensional nitrations [7]. 

4. Results 

For the purpose of building intuition, we used classical multidimensional 
scaling (CMDS) on the distance matrix, using the above feature vectors as 
columns, to produce 2D and 3D visualizations of the lesions, shown in Figure 
[6j Note that the axes simply give the coordinates of the embedding given by 
CMDS and that the vertical axis in Figure [6b] is pointed at the reader in Figure 

M 

We use a SVM and leave one out cross-validation (LOOCV) to test the ef- 
ficacy of our methods. Because the dataset available to us is very unbalanced, 
see Section [I] we present results from four different subsets of the data. The 
first is the full dataset. In the second subset (HcHeCM), we use the HCCs, the 
hemangiomas, the cysts, and the metastases. In the third dataset (HeCM) we 
test on the hemangiomas, cysts, and metastases. This data set is most compa- 
rable to the sets used in previous classification works [13]. In the final dataset 
(CM) we remove the hemangiomas, leaving only the cysts and metastases. 
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(a) A 2D View of 132 Hepatic Lesions 



(b) A 3D View of 132 Hepatic Lesions 
Figure 6: Visualizations of Topological Features 
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Table 1: SVM Classification Accuracies for ID and 2D Filtrations 



Filtration 


Full 


HcHeCM 


HcCM 


CM 


ID (Intensity) 


55.30% 


59.66% 


63.89% 


80.00% 


2D 


66.67% 


72.27 % 


80.56% 


85.56% 



We used the Gaussian kernel (also called the radial basis function), g^ 2 !"— «l 2 / 2 
in combination with the SVM. We performed an exponential parameter sweep 
to find reasonable values for the parameter a and the SVM cost parameter C 
for each data set. In Table [2] we show the misclassification rates of each lesion 
type in the HeCM dataset. 

Table 2: HeCM % Classification Accuracy by Lesion Type 



Filtration 


% of HeCM 


% of Heman. 


% of Cysts 


% of Metas. 


ID 


63.89% 


27.78% 


77.78% 


64.44% 


2D 


80.56% 


72.22% 


88.89% 


75.56% 



Upon examination of the lesions which were misclassified, we noticed that 
many of the lesions were significantly larger than the median lesion area (1285.5 
pixels) of the dataset. Taking HeCM, we performed the same analysis as above, 
but removed lesions with various pixel areas. The results are summarized in 
Table S 

Table 3: Classification by Lesion Size of HeCM 



Lesion Size by Area 


% Accu. 


# of Heman. 


# of Cysts 


# of Metas. 


All 


80.56% 


18 


45 


45 


< 10000 px 


83.50% 


18 


42 


43 


<5000 px 


86.96% 


16 


39 


37 


<2500 px 


86.25% 


14 


32 


34 


<1250 px 


91.53% 


8 


28 


23 



5. Discussion 

A large portion of the misclassifications are due to the difficulties in normal- 
izing for the number of pixels present in the lesion. A large lesion, has many 
more potential features than a small lesion. Since the matching metric involves 
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a matching of bars, a larger lesion will be a greater distance from a smaller le- 
sion, even if the tissue structure is the same, because of the unmatched bars. A 
method of accounting for the differences in lesion size would improve our results 
considerably. In addition to this, our results are potentially sensitive to the 
lesions available for comparison. A different comparison set (or even a synthetic 
comparison set) could improve or degrade the results. Addressing both these 
issues are potential future directions of research. 

Nevertheless, the current results are comparable to more traditional feature 
based classification methods [331 [17] . This demonstrates that multidimensional 
persistence is viable candidate for integration with other methods of developing 
features in radiological data. In particular, our methods may be complementary 
to the standard techniques currently in use. 

These results also demonstrate the power of combining topology and geome- 
try via persistence. In this case, it is clear that using a radial geometry (via the 
border filtration) significantly improves the classifying power of barcodes, espe- 
cially in the case of hemangiomas, which are characterized by large homogenous 
regions on the periphery of the lesion. This demonstrates one way in which this 
methodology captures radiological observations. 

This procedure is flexible enough to be used in a variety of contexts. Any 
method of assigning values to each pixel could be used as a filtration to generate 
a barcode. Filtering could be done on the image or on the barcodes (for example, 
removing bars of small length). If a different geometric filter is called for in an 
application, it can be easily accommodated by our methods. 

Additionally, the output from our algorithm was ready for input into an 
existing machine learning algorithm. This demonstrates that our algorithm is 
easily integrated into existing computational machinery and can be combined 
with more traditional methods of feature generation. 

6. Conclusion 

We have implemented and tested a methodology for the classification of hep- 
atic lesions. Using multidimensional persistent homology and a support vector 
machine, we demonstrated the ability of multidimensional persistence to com- 
bine different features of interest for improved results over a one-dimensional 
filtration. Our computational framework can be used in a variety of image clas- 
sification problems outside of lesion classification and can be tailored to the 
specific application by changing the filtrations used on the image. We achieved 
comparable results to the traditional classification methods of hepatic lesions 
and, because our methods are topologically based, this makes them good can- 
didates for integration with classical non-topologically based image processing 
techniques. 
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